Load the Libraries

> # Load the Required Libraries
>   rm(list=ls()) # Remove previous work
>   library(rrBLUP)
>   library(BGLR)
>   library(AGHmatrix)
>   library(ggplot2)
>   library(DT)
>   #library(cvTools)
>   library(dplyr)
>   library(lme4)
>   library(arm)

This section shows the analysis of filtered phenotypic data in lme4 and other open source R packages. The filtered data set was obtained after pre-processing and Quality check of data


Phenotypic Data Analysis in lme4 R Package


  • Here in this section phenotypic data analysis is performed in an open source R package called lme4. More on this R package can be found here lme4 Tutorial 1, and lme4 Tutorial 2.

  • The purpose of this section is to repeat the phenotypic data analysis in lme4 as ASReml R package is commercial package and may not available for all the users.

  • Filtered data set will be used, same one used in ASReml R package to perform the analysis in lme4.

  • ANOVA, variance components, BLUPS, BLUES and heritability is extracted for the results part.

Upload the Filtered Phenotypic Data

>   demo.data.filtered<-read.csv(file="./Data_sets/demo.data.filtered.csv",
+                                header = TRUE)
> # factor conversion if below are not in factors 
>   columns<-c("Environment", "Genotype", "Rep", "Block", "Row", "Column", "Line.type")
>   demo.data.filtered[, columns]<-lapply(columns, function(x) as.factor(demo.data.filtered[[x]]))
>   demo.data.filtered$Yield<-as.numeric(demo.data.filtered$Yield)
>   demo.data.filtered$HT<-as.numeric(demo.data.filtered$HT)
>   demo.data.filtered$DTF<-as.numeric(demo.data.filtered$DTF)
> 
> # Subset the required columns
>   demo.data.filtered<-demo.data.filtered[, c("Environment", "Genotype", "Rep", 
+                                              "Block", "Row", "Column", "Line.type",
+                                              "Yield", "HT", "DTF")]
> # First we will arrange the rows and columns for spatial analysis.
> # Now we will subset the environments and Yields for analysis
>   demo.data.filtered<-data.frame(demo.data.filtered%>% group_by(Environment)%>%arrange(Row, Column)) # arrange by row and column
>   demo.data.filtered<-data.frame(demo.data.filtered%>% arrange(Environment)) # Arrange by environment
> 
> #demo.data.filtered<-demo.data.filtered[!demo.data.filtered$Environment %in% c("Env2", "Env5","Env8", "Env9"), ]
>   # View as table in file
> print_table <- function(table, ...){
+  datatable(table, extensions = 'Buttons',
+  options = list(scrollX = TRUE, 
+ dom = '<<t>Bp>',
+  buttons = c('copy', 'excel', 'pdf', 'print')), ...)
+ }
> print_table(demo.data.filtered, editable = 'cell', rownames = FALSE, caption = htmltools::tags$caption("Table: Showing Filtered Raw Data at Multiple Locations",style="color:black; font-size:130%"), filter = 'top')

Quick Visualization of Data

>   ggplot(data = demo.data.filtered, aes(x = Environment, y = Yield, fill = Rep))+
+   geom_boxplot()+
+   theme_classic()+
+   theme(axis.text.x = element_text(angle = 45, hjust = 1)) +# fill by timepoint to give different color
+   #scale_fill_manual(values = c("", ""))+
+   #scale_color_manual(values = c("", ""))
+   theme (plot.title = element_text(color="black", size=12,hjust=0.5, face = "bold"), # add and modify the title to plot
+   axis.title.x = element_text(color="black", size=12, face = "bold"), # add and modify title to x axis
+   axis.title.y = element_text(color="black", size=12, face="bold")) + # add and modify title to y axis
+ #scale_y_continuous(limits=c(0,15000), breaks=seq(0,15000,1000), expand = c(0, 0))+
+  theme(axis.text= element_text(color = "black", size = 10)) # modify the axis text

Single Stage/Step Wise Analysis

  • In this section, data analysis will be shown only for grain yield trait using a Linear Mixed-Model Approach in lme4 R Package package, and will be useful to the users who do not have access to the commercial ASReml-R package.

  • In general analysis pipeline is divided in two parts:

    1. Separate analysis/step-wise: In this each environment/trial is analyzed separately.

    2. Combined analysis or Multi-environment trial (MET) analysis: In this analysis all the environments will be analyzed jointly.

    • Various mixed models from basic to advanced models will be used will for MET analysis.

    • First let us subset the data for on environment to show how to perform the analysis for one trial or environment in lme4 R package

    • We will run models which are feasible in lme4 R package. Note spatial models are not possible to run in lme4 R package.

    • We will use basic models and show how to extract the results

Subset the Data for One Environment

  • Subset the data for one environment first.
> # Subset the environment 1
>   sub.data<-subset(demo.data.filtered, Environment=="Env1")
>   sub.data<-droplevels.data.frame(sub.data)

Run the Mixed model

Model 1.lme4

  • The model described below is equivalent to model 1 described in ASReml R package analysis.

\[ y_{ijk}= \mu+g_{i} + r_{j}+ b_{jk} + \epsilon_{ijk}\\ y_{ijk}= \text{ is the effect of $i$th genotype in $j$th replication and $k$th block within the $j$th replication} \\ \mu= \text {overall mean}\\ g_{i}=\text{random effect of the $i$th genotype}\\ r_{j}=\text{fixed effect of the $j$th replication}\\ b_{jk}= \text {random effect of $k$th block nested within $j$ replication}\\ \varepsilon_{ijk}=\text{residual error}\\ \text{here we assume errors are independent and identically distributed }\epsilon\sim \text{$iid$N}(0,\sigma_\epsilon^2)\\ \]


> # Now apply model
>   model1<-lmer(Yield~Rep+(1|Genotype)+ (1|Rep:Block), data =sub.data)

Results

  • Here we will summarize the results using summary() function. The first few lines of output indicate that the model was fitted by REML as well as the value of the REML criterion. The second piece of the summary output provides information regarding the random-effects and residual variation. The third piece of the summary output provides information regarding the fixed-effects and the fourth piece of summary output provides information regarding the correlation of fixed effects.
> # Summarise the results
>   summary(model1)
Linear mixed model fit by REML ['lmerMod']
Formula: Yield ~ Rep + (1 | Genotype) + (1 | Rep:Block)
   Data: sub.data

REML criterion at convergence: 6239.3

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.90048 -0.59387  0.03899  0.60311  1.71001 

Random effects:
 Groups    Name        Variance Std.Dev.
 Genotype  (Intercept) 431861   657.2   
 Rep:Block (Intercept)  28499   168.8   
 Residual              193255   439.6   
Number of obs: 394, groups:  Genotype, 197; Rep:Block, 10

Fixed effects:
            Estimate Std. Error t value
(Intercept)  5233.19      94.20  55.552
Rep2           60.38     115.60   0.522

Correlation of Fixed Effects:
     (Intr)
Rep2 -0.614

Extract variance components

  • Here we will extract variance components
>   Ve<- VarCorr(model1)
>   Ve
 Groups    Name        Std.Dev.
 Genotype  (Intercept) 657.16  
 Rep:Block (Intercept) 168.82  
 Residual              439.61  

Plot the residual vs fitted plot

  • Here will show how to check for check for homoscedasticicty
> # Plot the residual plot
>   plot(fitted(model1), resid(model1), type="pearson")
>   abline(0,0, col="blue")

> # Plot QQ plot
>   qqnorm(resid(model1))

> # Residual plot
>   plot(residuals(model1,type="pearson"), main='Model residuals', 
+   ylab='Pearson residual value')

ANOVA for fixed effects

> # ANOVA
>   anova(model1)
Analysis of Variance Table
    npar Sum Sq Mean Sq F value
Rep    1  52724   52724  0.2728

Extract the Fixed effects

  • Here will show how to extract the BLUEs.
>   BLUEs<-fixef(model1)
>   BLUEs
(Intercept)        Rep2 
  5233.1856     60.3794 

Extract the Random effects

  • Here will show how to extract the BLUPs.
> # Extract the Random effects
>   BLUPs<-data.frame(Blups.yield=ranef(model1)$Genotype)
>   GV<-data.frame(BLUps.GY=coef(model1)$Genotype[,1]) #Genotype values (Blups +Intercept)

Heritability

  • Here will show how to calculate the heritability. Two approaches will be show how to estimate heritability: 1) Based on Variance components and 2) Based on Cullis et al. 2006 is also ….\(1-\frac{\overline{V}_{BLUp}}{2\sigma^{2}g}\). Where \(\overline{V}_{BLUP}\) is mean variance difference of two genotypes based on BLUPs and \(\sigma^{2}g\) is variance of genotypes.
> # Extract the variance components
>   Ve<- data.frame (VarCorr(model1))
>   Ve
        grp        var1 var2      vcov    sdcor
1  Genotype (Intercept) <NA> 431860.94 657.1613
2 Rep:Block (Intercept) <NA>  28498.55 168.8151
3  Residual        <NA> <NA> 193254.94 439.6077
> # Now calculate heritability using variance components
>   genotype.var=Ve[1,4]
>   error.var=Ve[2,4]
> # Now heritability
>   h2=genotype.var/(genotype.var+error.var)*100
>   h2
[1] 93.8095
> # Reliability
>   std.err<-se.ranef(model1)$Genotype
>   v_BLUP<- mean(std.err)
> # Heritability/Reliability 
>   h2<- (1-((v_BLUP)^2/(Ve[1,4]*2)))*100
>   h2
[1] 90.55036

Run the Analysis for all Environments

> # Run the analysis and check reliability
>   # For Non-Stress Data using DTF as co-variate
>   demo.data.filtered$Environment<- as.character(demo.data.filtered$Environment)
>   un.exp<- unique(demo.data.filtered$Environment)
>   for(i in 1:length(un.exp)){
+     sub<- droplevels.data.frame(demo.data.filtered[which(demo.data.filtered$Environment==un.exp[i]),])
+ 
+       model<-lmer(Yield~Rep+(1|Genotype)+ (1|Rep:Block), data =sub)
+       #BLUPs<-data.frame(Blups.yield=ranef(model)$Genotype, Environment=un.exp[i])
+         BLUPs<-data.frame(BLUps.GY=coef(model)$Genotype[,1], Environment=un.exp[i])
+     if(i>1){
+       BLUPs.all<-rbind(BLUPs.all, BLUPs)
+     }
+     else{
+       BLUPs.all<- BLUPs
+     }
+   }
> # Save the BLUES out put file for Genomic Predictions
>  #estimates.all$Genotype<-gsub("^.{8}", "",  estimates.all$Genotype)

MET Analysis


Model 2.lme4

  • Here we will analyze all the environments jointly and extract the single BLUE for each genotype. We will use mixed model analysis in lme4 r package model. We will treat genotypes as fixed and environment as random effect.

Combined ANOVA

  • Here ANOVA will be generated for all the factor levels.

  • Replications are nested with environments and Blocks are within Replications which are nested within environment.

> # Linear model to get ANOVA
>   demo.data.filtered$Environment<-as.factor(demo.data.filtered$Environment)
>   model.anova<-lm(formula = Yield~Genotype+Environment+Genotype*Environment+Environment:Rep+ Environment:Rep:Block,
+     data=demo.data.filtered)
> # Get ANOVA
>   anova(model.anova)
Analysis of Variance Table

Response: Yield
                        Df     Sum Sq   Mean Sq   F value    Pr(>F)    
Genotype               199 1185402954   5956799   14.5851 < 2.2e-16 ***
Environment              9 7105461322 789495702 1933.0651 < 2.2e-16 ***
Genotype:Environment  1779 3073244084   1727512    4.2298 < 2.2e-16 ***
Environment:Rep         10   57746059   5774606   14.1390 < 2.2e-16 ***
Environment:Rep:Block   76   57387089    755093    1.8488  1.72e-05 ***
Residuals             1890  771907223    408417                        
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Significant differences are observed for all factors and genotype by environment interactions are significant

Check for Homogeneity of Variance

  • Some test can be used to check variance between pair of environments as given below:

  • More on this can be found on this: Source 1, Source 2

  • Here we will check the distribution of residuals and see how they vary as we have more than two environments. For that we will run the mixed model in lme4 and then plot the residuals

> # 
> model2<- lmer(Yield~Rep+(1|Genotype)+(1|Environment)+
+           (1|Environment:Rep)+(1|Environment:Rep:Block),
+           data=demo.data.filtered)
> 
> #plot residuals 
> plot(residuals(model2,type="pearson"), main='Model residuals', 
+ ylab='Pearson residual value')

> #var.test(Yield~Environment,data=demo.data.filtered)

From the plot it is clear that residuals are not same and highly different

Combined Analysis in lme4

  • The model we will use is give below:

\[ y_{ijkl}= \mu+g_{i} + e_{j}+ (ge)_{ij}+r_{jk}+ b_{jkl} +\epsilon_{ijklm}\\ \mu= \text {overall mean}\\ g_{i}=\text{random effect of the $i$th genotype}\\ e_{j}=\text{random effect of the $j$th environment}\\ (ge)_{ij}=\text{is the interaction effect of $i$th genotypes with the $j$th environment}\\ r_{jk}=\text{fixed effect of the $k$th replication nested within $j$th environment}\\ b_{jkl}= \text {random effect of $l$th block nested with $j$ environment and $k$th replication}\\ \varepsilon_{ijkl}=\text{residual error}\\ \text{here we assume residuals are independent and identically distributed}\\ \]


  • Mixed models are powerful tools to handle assumptions of linear model Read this one

  • We will extract variance components and also calculate heritability.

> demo.data.filtered$Environment<-as.factor(demo.data.filtered$Environment)
> Model3.lme4<-lmer(Yield~Genotype+(1|Rep)+(1|Environment:Genotype)+
+           +(1|Environment:Rep:Block), data=demo.data.filtered)

Summary of MET results

  • In summary we will get following summarized results: 1) Description of model we used, 2) Random effects and varainces, 3) Fixed effects, 4) Correlation of fixed efefcts
>   summary(Model3.lme4)
Linear mixed model fit by REML ['lmerMod']
Formula: Yield ~ Genotype + (1 | Rep) + (1 | Environment:Genotype) + +(1 |  
    Environment:Rep:Block)
   Data: demo.data.filtered

REML criterion at convergence: 62923.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5042 -0.4373 -0.0158  0.4092  4.1520 

Random effects:
 Groups                Name        Variance  Std.Dev. 
 Environment:Genotype  (Intercept) 6.559e+05 8.099e+02
 Environment:Rep:Block (Intercept) 1.859e+06 1.363e+03
 Rep                   (Intercept) 5.465e-03 7.393e-02
 Residual                          4.086e+05 6.393e+02
Number of obs: 3964, groups:  
Environment:Genotype, 1988; Environment:Rep:Block, 96; Rep, 2

Fixed effects:
              Estimate Std. Error t value
(Intercept)  4006.5428   325.7186  12.301
Genotype2    -158.3061   418.7820  -0.378
Genotype3    -100.7883   416.7684  -0.242
Genotype4     685.7349   416.6863   1.646
Genotype5     395.5221   416.9535   0.949
Genotype6     409.2635   415.8521   0.984
Genotype7    -355.5394   416.8936  -0.853
Genotype8     546.5261   416.4542   1.312
Genotype9    -480.3568   416.5919  -1.153
Genotype10   -474.1254   418.4225  -1.133
Genotype11    589.1146   416.2162   1.415
Genotype12    215.6045   416.2547   0.518
Genotype13     49.1649   416.0796   0.118
Genotype14    477.5898   416.8830   1.146
Genotype15   -260.8794   416.2345  -0.627
Genotype16   -223.6248   428.3856  -0.522
Genotype17    241.2194   416.6967   0.579
Genotype18    197.2838   416.2462   0.474
Genotype19   -411.0785   416.8845  -0.986
Genotype20    377.8620   416.5646   0.907
Genotype21    -26.2123   415.8055  -0.063
Genotype22    489.4758   416.5019   1.175
Genotype23      0.6426   416.7124   0.002
Genotype24    642.2857   419.0594   1.533
Genotype25    304.2129   416.6928   0.730
Genotype26     -2.0851   416.2181  -0.005
Genotype27   -309.2393   416.5684  -0.742
Genotype28   -258.6101   416.7560  -0.621
Genotype29    165.2624   417.9146   0.395
Genotype30    104.9037   415.9059   0.252
Genotype31   -670.4483   416.7135  -1.609
Genotype32    583.0878   416.5086   1.400
Genotype33    -14.6119   416.2335  -0.035
 [ reached getOption("max.print") -- omitted 167 rows ]

Plot of model

  • With the plot function model we will get the residuals vs fitted values
> plot(Model3.lme4)

Extract the variance components

> Ve<- data.frame (VarCorr(Model3.lme4))
> Ve
                    grp        var1 var2      vcov     sdcor
1  Environment:Genotype (Intercept) <NA>  655401.3  809.5686
2              Genotype (Intercept) <NA>  209278.7  457.4698
3 Environment:Rep:Block (Intercept) <NA>   18006.0  134.1864
4       Environment:Rep (Intercept) <NA> 2002963.3 1415.2608
5              Residual        <NA> <NA>  408863.4  639.4243

Heritability

  • Here will estimate the combined heritability based on Cullis et al.2006
> #std.err<-se.ranef(Model3.lme4)$Genotype
> #v_BLUP<- mean(std.err)
> # Heritability/Reliability 
> #h2<- (1-((v_BLUP)^2/(Ve[2,4]*2)))*100
> #h2

BLUEs for Random Effects

> # BLUEs
>   BLUEs.all<-data.frame(BLUEs.Yield=fixef(Model3.lme4))
>   #BLUPs<-data.frame(BLUps.GY=coef(Model3.lme4)$Genotype[,1])
>   head(BLUEs.all)
            BLUEs.Yield
(Intercept)   4006.5428
Genotype2     -158.3061
Genotype3     -100.7883
Genotype4      685.7349
Genotype5      395.5221
Genotype6      409.2635

Accounting Spatial Varaibility

> library(statgenSTA)
> TD_STA <- createTD(sub.data, 
+                    trial = "Environment",
+                    genotype = "Genotype", 
+                    rowCoord = "Row", 
+                    colCoord = "Column",
+                    repId = "Rep",
+                    subBlock = "Block",
+                    trDesign = "res.ibd")
> 
> ## Create layout plot with variety labels
> plot(TD_STA, 
+         plotType = "layout",
+         showGeno = TRUE)

> ## Model specification (using engine = "SpATS")
> sta_model_SpATS <- fitTD(TD = TD_STA, 
+                     traits = "Yield",
+                     design = "res.ibd",
+                     what = "fixed",
+                     spatial = TRUE,
+                     engine = "SpATS")
> 
> plot(sta_model_SpATS, 
+      plotType = "spatial", 
+      traits = 'Yield')

> ## Extract all available statistics from the fitted model.
>   extr <- extractSTA(sta_model_SpATS)
> ## Extract only the BLUEs from the fitted model.
>   BLUEs <- extractSTA(sta_model_SpATS,
+                     what = "BLUEs")

Additional on MET and Stability Analysis

  • Here in this section we are giving some useful R resources that can be used for stability and MET analysis.
  1. metan-R: Multi-environment Trial Analysis

  2. gge-R: Functions for GGE and GGB



  1. IRRI,Rice Breeding Innovations, ↩︎